CustardPy API¶
Here we provide some small Python pieces that use the functions of CustardPy.
This notebook assumes that the output of the CustardPy tutorial is stored in the CustardPyResults_Hi-C/Juicer_hg38/ directory.
%matplotlib inline
%reload_ext autoreload
%autoreload 2
from custardpy.loadData import *
from custardpy.HiCmodule import *
from custardpy.PlotModule import *
from custardpy.plotHiCfeature_module import *
from custardpy.DirectionalRelativeFreq import *
cm = generate_cmap(['#1310cc', '#FFFFFF', '#d10a3f'])
# Specify samples to plot
samplelist = ['Control', 'siCTCF', 'siRad21']
Define the function to read the contact matrix and PC1 value from the CustardPyResults_Hi-C directory.
The JuicerMatrix function loads the contact matrix and the eigenvector (PC1). Here we define the get_samples function that returns the list of the JuicerMatrix object.
dirname = "CustardPyResults_Hi-C/Juicer_hg38"
def get_samples(samplelist, chr, norm, resolution):
samples = []
for sample in samplelist:
observed = f"{dirname}/{sample}/Matrix/intrachromosomal/{resolution}/observed.{norm}.{chr}.matrix.gz"
eigenfile = f"{dirname}/{sample}/Eigen/{resolution}/eigen.{norm}.{chr}.txt.gz"
samples.append(JuicerMatrix("RPM", observed, resolution, eigenfile=eigenfile))
return samples
Chromosome 14, 20 Mbp to 107 Mbp¶
As an example, we plot the contact matrix of the entire chromosome 14 (from 20 Mbp to 107 Mbp) with 100 kbp resolution. This setting is mainly for viewing the compartment structure.
# Load contact maps for chromosome 14
chr = "chr14"
resolution = 100000
norm = "SCALE"
samples = get_samples(samplelist, chr, norm, resolution)
nsample = len(samples)
CustardPyResults_Hi-C/Juicer_hg38/Control/Matrix/intrachromosomal/100000/observed.SCALE.chr14.matrix.gz CustardPyResults_Hi-C/Juicer_hg38/siCTCF/Matrix/intrachromosomal/100000/observed.SCALE.chr14.matrix.gz CustardPyResults_Hi-C/Juicer_hg38/siRad21/Matrix/intrachromosomal/100000/observed.SCALE.chr14.matrix.gz
s = 20000000 # start position
e = 107000000 # end position
vmin = 0 # minimum value of the color scale
vmax = 50 # maximum value of the color scale
plt.figure(figsize=(10, 5))
for i, sample in enumerate(samples):
label = samplelist[i]
ax = plt.subplot(1, 3, i+1)
mat = sample.getmatrix().loc[s:e,s:e]
im = ax.imshow(mat, clim=(vmin, vmax), cmap=generate_cmap(['#FFFFFF', '#d10a3f']))
ax.set_title(label)
pltxticks_subplot2grid(0, mat.shape[1], s, e, 5, ax=ax)
ax.set_yticks([])
plt.tight_layout()
cbar = plt.colorbar(im, ax=plt.gcf().axes, orientation='vertical', fraction=0.015, pad=0.04)
cbar.set_label('Concact frequency')
Alternatively, you can use the log scale contact frequency with the getlog function to emphasize the TAD and compartment structure.
s = 20000000 # start position
e = 107000000 # end position
vmin = 0 # minimum value of the color scale
vmax = 6 # maximum value of the color scale
plt.figure(figsize=(10, 5))
for i, sample in enumerate(samples):
label = samplelist[i]
ax = plt.subplot(1, 3, i+1)
mat = sample.getlog().loc[s:e,s:e]
im = ax.imshow(mat, clim=(vmin, vmax), cmap=generate_cmap(['#FFFFFF', '#d10a3f']))
ax.set_title(label)
pltxticks_subplot2grid(0, mat.shape[1], s, e, 5, ax=ax)
ax.set_yticks([])
plt.tight_layout()
cbar = plt.colorbar(im, ax=plt.gcf().axes, orientation='vertical', fraction=0.015, pad=0.04)
cbar.set_label('Concact frequency')
chromosome 21, 25 Mbp to 31 Mbp¶
Next, we plot the region from 25 Mbp to 31 Mbp of chromosome 21 at 25 kbp resolution. This region corresponds to the Figure 1A of [Haarhuis et al. Cell, 2017].
resolution = 25000
norm = "SCALE"
chr="chr21"
samples = get_samples(samplelist, chr, norm, resolution)
nsample = len(samples)
CustardPyResults_Hi-C/Juicer_hg38/Control/Matrix/intrachromosomal/25000/observed.SCALE.chr21.matrix.gz CustardPyResults_Hi-C/Juicer_hg38/siCTCF/Matrix/intrachromosomal/25000/observed.SCALE.chr21.matrix.gz CustardPyResults_Hi-C/Juicer_hg38/siRad21/Matrix/intrachromosomal/25000/observed.SCALE.chr21.matrix.gz
s = 25000000
e = 31000000
vmax = 100
vmin = 0
plt.figure(figsize=(12, 10))
for i, sample in enumerate(samples):
label = samplelist[i]
ax = plt.subplot(1, 3, i+1)
mat = sample.getmatrix().loc[s:e,s:e]
im = ax.imshow(mat, clim=(vmin, vmax), cmap=generate_cmap(['#FFFFFF', '#d10a3f']))
ax.set_title(label)
pltxticks_subplot2grid(0, mat.shape[1], s, e, 5, ax=ax)
ax.set_yticks([])
plt.tight_layout()
cbar = plt.colorbar(im, ax=plt.gcf().axes, orientation='vertical', fraction=0.015, pad=0.04)
cbar.set_label('Concact frequency')
The heatmaps clearly show the difference in TAD and loop structures between samples.
Plotting a sample pair in a single heatmap¶
Use the plot_SamplePair_triu function to plot a sample pair in a single heatmap.
Here we visualize Control and siRad21.
resolution=25000
s = 25000000
e = 31000000
vmax = 100
vmin = 0
sample1 = samples[0]
sample2 = samples[2]
label1 = samplelist[0]
label2 = samplelist[2]
plot_SamplePair_triu(sample1, sample2, label1, label2,
resolution, s, e, vmax, vmin)
Triangle heatmap with TADs and loops¶
Let's visualize triangle heatmaps.
CustardPy has a plot_HiC_Map function to visualize the Hi-C triangle heatmap.
Here we also visualize TADs and loops by specifying the path of the TADs and the loops called in the CustardPy workflow (see the tutorial for details). The loaded file paths are displayed in the logs, so you can verify that the correct files are being visualized.
chr="chr21"
s = 25000000
e = 31000000
resolution = 25000
norm = "SCALE"
# Parameters for drawing
nrow_now = 0
nrow_heatmap = 4
nrow = nrow_heatmap * 3
colspan_plot = 12
colspan_colorbar = 2
colspan_full = (colspan_plot + colspan_colorbar)
vmax = 100 # maximum value of the color scale
vmin = 0 # minimum value of the color scale
distancemax = 0
plt.figure(figsize=(8, 8))
for i, sample in enumerate(samples):
label = samplelist[i]
tadfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/TAD/" + norm + "/" + str(resolution) + "_blocks.bedpe"
loopfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/loops/" + norm + "/merged_loops.bedpe"
plot_HiC_Map(nrow, nrow_now, nrow_heatmap, sample, label, chr,
type, resolution, vmax, vmin, s, e, distancemax,
colspan_plot, colspan_colorbar, colspan_full,
tadfile=tadfile, loopfile=loopfile)
nrow_now += nrow_heatmap
plt.tight_layout()
CustardPyResults_Hi-C/Juicer_hg38/Control/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/Control/loops/SCALE/merged_loops.bedpe CustardPyResults_Hi-C/Juicer_hg38/siCTCF/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/siCTCF/loops/SCALE/merged_loops.bedpe CustardPyResults_Hi-C/Juicer_hg38/siRad21/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/siRad21/loops/SCALE/merged_loops.bedpe
The black dashed lines and blue circles indicate TADs and loops, respectively.
If you don't want to plot TADs and loops, simply skip loading them.
chr="chr21"
s = 25000000
e = 31000000
resolution = 25000
norm = "SCALE"
# Parameters for drawing
nrow_now = 0
nrow_heatmap = 4
nrow = nrow_heatmap * 3
colspan_plot = 12
colspan_colorbar = 2
colspan_full = (colspan_plot + colspan_colorbar)
vmax = 100 # maximum value of the color scale
vmin = 0 # minimum value of the color scale
distancemax = 0
plt.figure(figsize=(8, 8))
for i, sample in enumerate(samples):
label = samplelist[i]
plot_HiC_Map(nrow, nrow_now, nrow_heatmap, sample, label, chr,
type, resolution, vmax, vmin, s, e, distancemax,
colspan_plot, colspan_colorbar, colspan_full)
nrow_now += nrow_heatmap
plt.tight_layout()
When visualizing a broader region, the triangle becomes too large to see each TAD and loop.
In this case, set the distance_max parameter in the drawHeatmapTriangle_subplot2grid function to limit the maximum distance of the triangles.
chr="chr21"
s = 25000000
e = 40000000
resolution = 25000
norm = "SCALE"
distancemax = 3000000 # Maximum distance for visualization
# Parameters for drawing
nrow_now = 0
nrow_heatmap = 4
nrow = nrow_heatmap * 3
colspan_plot = 12
colspan_colorbar = 2
colspan_full = (colspan_plot + colspan_colorbar)
vmax = 100 # maximum value of the color scale
vmin = 0 # minimum value of the color scale
plt.figure(figsize=(10, 8))
for i, sample in enumerate(samples):
label = samplelist[i]
tadfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/TAD/" + norm + "/" + str(resolution) + "_blocks.bedpe"
loopfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/loops/" + norm + "/merged_loops.bedpe"
plot_HiC_Map(nrow, nrow_now, nrow_heatmap, sample, label, chr,
type, resolution, vmax, vmin, s, e, distancemax,
colspan_plot, colspan_colorbar, colspan_full,
tadfile=tadfile, loopfile=loopfile)
nrow_now += nrow_heatmap
plt.tight_layout()
CustardPyResults_Hi-C/Juicer_hg38/Control/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/Control/loops/SCALE/merged_loops.bedpe CustardPyResults_Hi-C/Juicer_hg38/siCTCF/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/siCTCF/loops/SCALE/merged_loops.bedpe CustardPyResults_Hi-C/Juicer_hg38/siRad21/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/siRad21/loops/SCALE/merged_loops.bedpe
figstart = 25000000
figend = 40000000
resolution = 25000
sbin = int(figstart / resolution)
ebin = int(figend / resolution)
The plot_PC1 function plots the compartment PC1. The positive and negative values indicate the compartments A and B, respectively.
norm = "SCALE"
nrow_heatmap = 4
nrow_eigen = 2
nrow = (nrow_heatmap +1 + nrow_eigen +1) *3
nrow_now = 0
colspan_plot = 12
colspan_colorbar = 2
colspan_full = (colspan_plot + colspan_colorbar)
vmax = 100
vmin = 0
plt.figure(figsize=(10, 10))
for i, sample in enumerate(samples):
label = samplelist[i]
tadfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/TAD/" + norm + "/" + str(resolution) + "_blocks.bedpe"
loopfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/loops/" + norm + "/merged_loops.bedpe"
plot_HiC_Map(nrow, nrow_now, nrow_heatmap, sample, label, chr,
type, resolution, vmax, vmin, s, e, distancemax,
colspan_plot, colspan_colorbar, colspan_full,
tadfile=tadfile, loopfile=loopfile)
nrow_now += nrow_heatmap + 1
plot_PC1(nrow, nrow_now, nrow_eigen, sample, label, sbin, ebin, colspan_plot, colspan_full)
nrow_now += nrow_eigen + 1
plt.tight_layout()
CustardPyResults_Hi-C/Juicer_hg38/Control/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/Control/loops/SCALE/merged_loops.bedpe CustardPyResults_Hi-C/Juicer_hg38/siCTCF/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/siCTCF/loops/SCALE/merged_loops.bedpe CustardPyResults_Hi-C/Juicer_hg38/siRad21/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/siRad21/loops/SCALE/merged_loops.bedpe
Single insulation score¶
figstart = 25000000
figend = 32000000
resolution = 25000
sbin = int(figstart / resolution)
ebin = int(figend / resolution)
distancemax = 3000000 # Maximum distance for visualization
norm = "SCALE"
nrow_heatmap = 4
nrow_feature = 2
nrow = (nrow_heatmap +1 + nrow_feature + 1) * 3
nrow_now = 0
colspan_plot = 12
colspan_colorbar = 1
colspan_legend = 3
colspan_full = (colspan_plot + colspan_colorbar + colspan_legend)
vmax = 100
vmin = 0
plt.figure(figsize=(10, 12))
for i, sample in enumerate(samples):
label = samplelist[i]
tadfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/TAD/" + norm + "/" + str(resolution) + "_blocks.bedpe"
loopfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/loops/" + norm + "/merged_loops.bedpe"
plot_HiC_Map(nrow, nrow_now, nrow_heatmap, sample, label, chr,
type, resolution, vmax, vmin, s, e, distancemax,
colspan_plot, colspan_colorbar, colspan_full,
tadfile=tadfile, loopfile=loopfile)
nrow_now += nrow_heatmap + 1
plot_single_insulation_score(sample, label, nrow, nrow_now, nrow_feature,
figstart, figend, sbin, ebin,
colspan_plot, colspan_colorbar, colspan_legend, colspan_full)
nrow_now += nrow_feature + 1
plt.tight_layout()
CustardPyResults_Hi-C/Juicer_hg38/Control/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/Control/loops/SCALE/merged_loops.bedpe CustardPyResults_Hi-C/Juicer_hg38/siCTCF/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/siCTCF/loops/SCALE/merged_loops.bedpe CustardPyResults_Hi-C/Juicer_hg38/siRad21/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/siRad21/loops/SCALE/merged_loops.bedpe
You can view the insulation scores of all samples in this way:
figstart = 25000000
figend = 32000000
resolution = 25000
sbin = int(figstart / resolution)
ebin = int(figend / resolution)
distancemax = 3000000 # Maximum distance for visualization
norm = "SCALE"
nrow_heatmap = 4
nrow_feature = 2
nrow = (nrow_heatmap +1) * 3 + nrow_feature
nrow_now = 0
colspan_plot = 12
colspan_colorbar = 1
colspan_legend = 3
colspan_full = (colspan_plot + colspan_colorbar + colspan_legend)
vmax = 100
vmin = 0
plt.figure(figsize=(10, 8))
for i, sample in enumerate(samples):
label = samplelist[i]
tadfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/TAD/" + norm + "/" + str(resolution) + "_blocks.bedpe"
loopfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/loops/" + norm + "/merged_loops.bedpe"
plot_HiC_Map(nrow, nrow_now, nrow_heatmap, sample, label, chr,
type, resolution, vmax, vmin, s, e, distancemax,
colspan_plot, colspan_colorbar, colspan_full,
tadfile=tadfile, loopfile=loopfile)
nrow_now += nrow_heatmap +1
plot_single_insulation_score(samples, samplelist, nrow, nrow_now, nrow_feature,
figstart, figend, sbin, ebin,
colspan_plot, colspan_colorbar, colspan_legend, colspan_full)
plt.tight_layout()
CustardPyResults_Hi-C/Juicer_hg38/Control/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/Control/loops/SCALE/merged_loops.bedpe CustardPyResults_Hi-C/Juicer_hg38/siCTCF/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/siCTCF/loops/SCALE/merged_loops.bedpe CustardPyResults_Hi-C/Juicer_hg38/siRad21/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/siRad21/loops/SCALE/merged_loops.bedpe
If you want to display the insulation score of the samples as a heatmap, use the options heatmap=True, lineplot=False.
In the heatmap, red regions indicate lower values (i.e. insulated regions).
figstart = 25000000
figend = 32000000
resolution = 25000
sbin = int(figstart / resolution)
ebin = int(figend / resolution)
distancemax = 3000000 # Maximum distance for visualization
norm = "SCALE"
nrow_heatmap = 4
nrow_feature = 3
nrow = (nrow_heatmap +1) * 3 + nrow_feature
nrow_now = 0
colspan_plot = 12
colspan_colorbar = 1
colspan_legend = 3
colspan_full = (colspan_plot + colspan_colorbar + colspan_legend)
vmax = 100
vmin = 0
plt.figure(figsize=(10, 8))
for i, sample in enumerate(samples):
label = samplelist[i]
tadfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/TAD/" + norm + "/" + str(resolution) + "_blocks.bedpe"
loopfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/loops/" + norm + "/merged_loops.bedpe"
plot_HiC_Map(nrow, nrow_now, nrow_heatmap, sample, label, chr,
type, resolution, vmax, vmin, s, e, distancemax,
colspan_plot, colspan_colorbar, colspan_full,
tadfile=tadfile, loopfile=loopfile)
nrow_now += nrow_heatmap +1
plot_single_insulation_score(samples, samplelist, nrow, nrow_now, nrow_feature,
figstart, figend, sbin, ebin,
colspan_plot, colspan_colorbar, colspan_legend, colspan_full,
heatmap=True, lineplot=False)
plt.tight_layout()
CustardPyResults_Hi-C/Juicer_hg38/Control/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/Control/loops/SCALE/merged_loops.bedpe CustardPyResults_Hi-C/Juicer_hg38/siCTCF/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/siCTCF/loops/SCALE/merged_loops.bedpe CustardPyResults_Hi-C/Juicer_hg38/siRad21/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/siRad21/loops/SCALE/merged_loops.bedpe
Multi-scale insulation scores¶
The plot_multi_insulation_score function plots multi-scale insulation scores. The plot is similar to the one above (single insulation score for multiple samples), but note that this heatmap shows the insulation score at the different scale for each sample.
figstart = 25000000
figend = 32000000
resolution = 25000
sbin = int(figstart / resolution)
ebin = int(figend / resolution)
distancemax = 3000000 # Maximum distance for visualization
norm = "SCALE"
nrow_heatmap = 4
nrow_feature = 2
nrow = (nrow_heatmap +1 + nrow_feature + 1) * 3
nrow_now = 0
colspan_plot = 12
colspan_colorbar = 1
colspan_legend = 3
colspan_full = (colspan_plot + colspan_colorbar + colspan_legend)
vmax = 100
vmin = 0
plt.figure(figsize=(9, 10))
for i, sample in enumerate(samples):
label = samplelist[i]
tadfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/TAD/" + norm + "/" + str(resolution) + "_blocks.bedpe"
loopfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/loops/" + norm + "/merged_loops.bedpe"
plot_HiC_Map(nrow, nrow_now, nrow_heatmap, sample, label, chr,
type, resolution, vmax, vmin, s, e, distancemax,
colspan_plot, colspan_colorbar, colspan_full,
tadfile=tadfile, loopfile=loopfile)
nrow_now += nrow_heatmap + 1
plot_multi_insulation_score(sample, label, nrow, nrow_now, nrow_feature,
figstart, figend, sbin, ebin,
colspan_plot, colspan_colorbar, colspan_full)
nrow_now += nrow_feature + 1
plt.tight_layout()
CustardPyResults_Hi-C/Juicer_hg38/Control/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/Control/loops/SCALE/merged_loops.bedpe CustardPyResults_Hi-C/Juicer_hg38/siCTCF/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/siCTCF/loops/SCALE/merged_loops.bedpe CustardPyResults_Hi-C/Juicer_hg38/siRad21/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/siRad21/loops/SCALE/merged_loops.bedpe
Directionality index¶
Use the plot_directionality_index function to plot the directionality index.
figstart = 25000000
figend = 32000000
resolution = 25000
sbin = int(figstart / resolution)
ebin = int(figend / resolution)
distance = 500000 # distance for DI
distancemax = 3000000 # Maximum distance for visualization
norm = "SCALE"
nrow_heatmap = 4
nrow_feature = 2
nrow = (nrow_heatmap +1 + nrow_feature + 1) * 3
nrow_now = 0
colspan_plot = 12
colspan_colorbar = 1
colspan_legend = 3
colspan_full = (colspan_plot + colspan_colorbar + colspan_legend)
vmax = 100
vmin = 0
DI_vmin = -2000 # Mimnimum value of Directionality index
DI_vmax = 2000 # Maximum value of Directionality index
plt.figure(figsize=(10, 12))
for i, sample in enumerate(samples):
label = samplelist[i]
tadfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/TAD/" + norm + "/" + str(resolution) + "_blocks.bedpe"
loopfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/loops/" + norm + "/merged_loops.bedpe"
plot_HiC_Map(nrow, nrow_now, nrow_heatmap, sample, label, chr,
type, resolution, vmax, vmin, figstart, figend, distancemax,
colspan_plot, colspan_colorbar, colspan_full,
tadfile=tadfile, loopfile=loopfile)
nrow_now += nrow_heatmap + 1
plot_directionality_index(samples, samplelist, nrow, nrow_now, nrow_feature,
sbin, ebin, figstart, figend, distance,
colspan_plot, colspan_colorbar, colspan_legend, colspan_full,
vmin=DI_vmin, vmax=DI_vmax)
nrow_now += nrow_feature + 1
plt.tight_layout()
CustardPyResults_Hi-C/Juicer_hg38/Control/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/Control/loops/SCALE/merged_loops.bedpe CustardPyResults_Hi-C/Juicer_hg38/siCTCF/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/siCTCF/loops/SCALE/merged_loops.bedpe CustardPyResults_Hi-C/Juicer_hg38/siRad21/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/siRad21/loops/SCALE/merged_loops.bedpe
figstart = 25000000
figend = 32000000
resolution = 25000
sbin = int(figstart / resolution)
ebin = int(figend / resolution)
distance = 500000 # distance for DI
distancemax = 3000000 # Maximum distance for visualization
norm = "SCALE"
nrow_heatmap = 4
nrow_feature = 3
nrow = (nrow_heatmap +1) * 3 + nrow_feature
nrow_now = 0
colspan_plot = 12
colspan_colorbar = 1
colspan_legend = 1
colspan_full = (colspan_plot + colspan_colorbar + colspan_legend)
vmax = 100
vmin = 0
plt.figure(figsize=(10, 8))
for i, sample in enumerate(samples):
label = samplelist[i]
tadfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/TAD/" + norm + "/" + str(resolution) + "_blocks.bedpe"
loopfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/loops/" + norm + "/merged_loops.bedpe"
plot_HiC_Map(nrow, nrow_now, nrow_heatmap, sample, label, chr,
type, resolution, vmax, vmin, s, e, distancemax,
colspan_plot, colspan_colorbar, colspan_full,
tadfile=tadfile, loopfile=loopfile)
nrow_now += nrow_heatmap +1
plot_directionality_index(samples, samplelist, nrow, nrow_now, nrow_feature,
sbin, ebin, figstart, figend, distance,
colspan_plot, colspan_colorbar, colspan_legend, colspan_full,
vmin=DI_vmin, vmax=DI_vmax, heatmap=True, lineplot=False)
plt.tight_layout()
CustardPyResults_Hi-C/Juicer_hg38/Control/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/Control/loops/SCALE/merged_loops.bedpe CustardPyResults_Hi-C/Juicer_hg38/siCTCF/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/siCTCF/loops/SCALE/merged_loops.bedpe CustardPyResults_Hi-C/Juicer_hg38/siRad21/TAD/SCALE/25000_blocks.bedpe CustardPyResults_Hi-C/Juicer_hg38/siRad21/loops/SCALE/merged_loops.bedpe
Log ratio heatmap¶
To directly examine the depletion effect in the contact matrix, we visualize the log ratio heatmap here.
First, create log-ratio matrix using the make3dmatrixRatio function. The smooth_median_filter parameter is used to smooth the data to mitigate the low coverage of the sparse matrix.
smooth_median_filter = 3
Combined = make3dmatrixRatio(samples, smooth_median_filter)
labels = ["siCTCF", "siRad21"]
The Combined object stores the log-ratio matrix of siCTCF/Control and siRad21/Control.
You can use it to view the log-ratio heatmap as shown below.
figstart = 27500000
figend = 32500000
resolution = 25000
sbin = int(figstart / resolution)
ebin = int(figend / resolution)
vmax = 1
vmin = -1
plt.figure(figsize=(6, 3))
for i in range(len(labels)):
ax = plt.subplot(1, 2, i+1)
im = ax.imshow(Combined[i, sbin:ebin,sbin:ebin], clim=(vmin, vmax), cmap=cm)
ax.set_title(labels[i])
pltxticks_subplot2grid(0, ebin-sbin, figstart, figend, 5, ax=ax)
ax.set_yticks([])
plt.tight_layout()
cbar = plt.colorbar(im, ax=plt.gcf().axes, orientation='vertical', fraction=0.015, pad=0.04)
cbar.set_label('log enrichment')
Red and blue regions indicate increased and decreased contact frequency, respectively, after depletion.
To draw the triangle heatmap, use the drawHeatmapTriangle_subplot2grid function.
nrow = 8
nrow_now = 0
colspan_full = 30
figstart = 27500000
figend = 32500000
plt.figure(figsize=(6, 8))
for i, sample in enumerate(Combined):
ax = plt.subplot2grid((nrow, colspan_full), (nrow_now, 0), rowspan=2, colspan=24)
drawHeatmapTriangle_subplot2grid(sample, resolution,
figstart=figstart, figend=figend,
vmax=1, vmin=-1, cmap=cm,
label=labels[i], xticks=True,
logratio=True, control_label='Control')
nrow_now += 2
plt.tight_layout()
Directional Relative Frequency¶
Recently, we have defined a new score Directional Relative Frequency (DRF) that captures the asymmetric changes in inter-TAD interactions after depletion [Nakato et al, Nat Commun., 2023].
The plot_directional_relative_frequency function visualizes the line plot and heatmap of the DRF.
figstart = 25000000
figend = 40000000
resolution = 25000
sbin = int(figstart / resolution)
ebin = int(figend / resolution)
distancemax = 5000000 # Maximum distance for visualization
norm = "SCALE"
nrow_heatmap = 4
nrow_feature = 3
nrow = (nrow_heatmap +1 + nrow_feature + 1) * 3
nrow_now = 0
colspan_plot = 12
colspan_colorbar = 2
colspan_legend = 2
colspan_full = (colspan_plot + colspan_colorbar + colspan_legend)
vmax = 1
vmin = -1
plt.figure(figsize=(10, 12))
for i, sample in enumerate(Combined):
label = samplelist[i]
heatmap_ax = plt.subplot2grid((nrow, colspan_full), (nrow_now, 0),
rowspan=nrow_heatmap, colspan=colspan_plot)
colorbar_ax = plt.subplot2grid((nrow, colspan_full), (nrow_now, colspan_plot +1),
rowspan=nrow_heatmap, colspan=colspan_colorbar)
drawHeatmapTriangle_subplot2grid(sample, resolution, figstart=figstart, figend=figend,
vmax=vmax, vmin=vmin, cmap=cm, label=label, xticks=True,
logratio=True, control_label='Control',
heatmap_ax=heatmap_ax, colorbar_ax=colorbar_ax)
nrow_now += nrow_heatmap + 1
plot_directional_relative_frequency(samples, samplelist, nrow, nrow_now, nrow_feature,
sbin, ebin, figstart, figend, resolution,
True, True,
colspan_plot, colspan_colorbar, colspan_legend, colspan_full)
plt.tight_layout()
The DRF does not distinguish whether the right side is enriched or the left side is depleted when the score is high. By using the plot_triangle_ratio_multi function, it is possible to visualize the scores on both the left and right sides.
figstart = 25000000
figend = 40000000
resolution = 25000
sbin = int(figstart / resolution)
ebin = int(figend / resolution)
distancemax = 5000000 # Maximum distance for visualization
nrow_heatmap = 4
nrow_feature = 3
nrow = (nrow_heatmap +1 + nrow_feature + 1) * 3
nrow_now = 0
colspan_plot = 12
colspan_colorbar = 1
colspan_legend = 3
colspan_full = (colspan_plot + colspan_colorbar + colspan_legend)
vmin = -1
vmax = 1
plt.figure(figsize=(10, 12))
plot_triangle_ratio_multi(samples, samplelist, nrow, nrow_now, nrow_heatmap, nrow_feature,
sbin, ebin, figstart, figend, distancemax, resolution,
vmin, vmax, colspan_plot, colspan_colorbar, colspan_legend, colspan_full)
plt.tight_layout()
To obtain the distribution of the DRF, you can use the get_drf_matrix function.
drf_right = True
drf_left = True
DRFMatrix = get_drf_matrix(samples, resolution, drf_right, drf_left)
plt.figure(figsize=(10, 3))
for i, sample in enumerate(DRFMatrix):
plt.plot(sample, label=labels[i])
plt.legend()
<matplotlib.legend.Legend at 0x7f6514989e10>
import session_info
session_info.show()
Click to view session information
----- custardpy 0.7.1 matplotlib 3.8.3 numpy 1.25.2 pandas 1.5.3 scipy 1.10.1 session_info 1.0.0 sklearn 1.2.2 -----
Click to view modules imported as dependencies
PIL 10.2.0 anyio NA arrow 1.3.0 asttokens NA attr 23.1.0 attrs 23.1.0 babel 2.11.0 bottleneck 1.3.7 brotli 1.0.9 certifi 2024.02.02 cffi 1.16.0 charset_normalizer 2.0.4 colorama 0.4.6 comm 0.2.1 cycler 0.12.1 cython_runtime NA dateutil 2.8.2 debugpy 1.6.7 decorator 5.1.1 defusedxml 0.7.1 exceptiongroup 1.2.0 executing 0.8.3 fastjsonschema NA fqdn NA idna 3.4 ipykernel 6.28.0 isoduration NA jedi 0.18.1 jinja2 3.1.3 joblib 1.3.2 json5 NA jsonpointer 2.1 jsonschema 4.19.2 jsonschema_specifications NA jupyter_events 0.8.0 jupyter_server 2.10.0 jupyterlab_server 2.25.1 kiwisolver 1.4.4 markupsafe 2.0.1 matplotlib_inline 0.1.6 mpl_toolkits NA nbformat 5.9.2 numexpr 2.8.7 overrides NA packaging 23.1 parso 0.8.3 pexpect 4.8.0 pkg_resources NA platformdirs 3.10.0 prometheus_client NA prompt_toolkit 3.0.43 psutil 5.9.8 ptyprocess 0.7.0 pure_eval 0.2.2 pydev_ipython NA pydevconsole NA pydevd 2.9.5 pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pygments 2.15.1 pyparsing 3.0.9 pythonjsonlogger NA pytz 2023.3.post1 referencing NA requests 2.31.0 rfc3339_validator 0.1.4 rfc3986_validator 0.1.1 rpds NA ruamel NA send2trash NA simplejson 3.17.6 six 1.16.0 sniffio 1.3.0 sphinxcontrib NA stack_data 0.2.0 threadpoolctl 2.2.0 tornado 6.3.3 traitlets 5.7.1 typing_extensions NA uri_template NA urllib3 2.1.0 wcwidth 0.2.5 webcolors 1.13 websocket 1.7.0 yaml 6.0.1 zmq 24.0.1 zoneinfo NA zstandard 0.19.0
----- IPython 8.20.0 jupyter_client 8.6.0 jupyter_core 5.5.0 jupyterlab 4.0.11 notebook 7.0.8 ----- Python 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 15:36:39) [GCC 12.3.0] Linux-6.2.0-39-generic-x86_64-with-glibc2.35 ----- Session information updated at 2024-03-03 18:43